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Be careful men 
Search every cook and nanny 
Uh, hook and granny 
Uh, crooked fan. . . 
uh, search everywhere! 
Doc (Snow White and the seven dwarfs, 

Walt Disney, 1937) 

Abstract 

In this paper we introduce a new algorithm to study some NP-complete problems. 
This algorithm is a Markov Chain Monte Carlo (MCMC) inspired by the cavity method 
developed in the study of spin glass. We will focus on the maximum clique problem 
and we will compare this new algorithm with several standard algorithms on some 
DIMACS benchmark graphs and on random graphs. The performances of the new 
algorithm are quite surprising. Our effort in this paper is to be clear as well to those 
readers who are not in the field. 
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1 Introduction 

In the last years Mezard, Parisi, Zecchina ^2j, ^H] introduced a class of optimization 
algorithms to deal with K-satisfiability problems. Their strategy was based on the cavity 
method introduced in spin glass theory a long time ago and in particular on its zero- 
temperature version, more recently developed in ^0]- An important ingredient in their 
approach seems to be the locally tree-like structure of the interaction graph. 

In the case of the clique problem, i.e., the study of the maximal complete subgraph of a 
given graph G, we expect to be very far from a tree-like structure of the interaction graph 
even locally, for instance when G is a random graph. We introduce in this paper a new 
algorithm to treat this problem, based again on the cavity method but in a completely 
different way. 
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This algorithm represents a first step in the application of the cavity idea that will 
be developed in a forthcoming paper. On the other hand this algorithm is sufficiently 
simple so that its behavior can be studied at least on random graphs providing some 
explanation of the difficulty of the problem. The algorithm introduced in this paper 
represents an heuristical search of cliques in the sense that the optimality of the result is 
not guaranteed. For a recent review on the numerical approach to the clique problem see 
e.g. and references therein. 

1.1 Definitions 

Let G = (V, E) be a graph. A graph g is a subgraph of G, g C G, if its vertex set V(g) C V 
and its edges E(g) C E. For any A C V we denote by G[A] the graph induced by A in G: 



We will denote by K,{G) the set of complete subgraphs or cliques of G and by AiaxCl(G) 
the set of maximum cliques in G: 



where \B\ denotes the cardinality of the set B. 

We call clique number of the graph G, u>(G), the cardinality of the vertex set of any 
maximum clique in G, i.e., u>(G) = \ V(g)\ with g € M.axCl{G). 

There are several versions of the problem of the determination of the clique number 
and of the maximum clique set of a given graph G. We recall here the most cited form. 

Clique problem: given a graph G = (V, E) and a positive integer k < \V\, does G 
contain a complete subgraph of size k or more? That is, does uj(G) > k hold? 

As it is well known (see jl] ) the clique problem is a NP-complete problem. There are other 
famous NP-complete problems equivalent to the clique problem as the vertex covering and 
the independent set, defined as follows: 

Vertex covering: given a graph G = (V, E) and a positive integer k < \V\, is there 
a vertex cover of size k or less for G, i.e., a subset V C V with \V'\ < k such that 
for each edge (u, v) € E at least one of u and v belongs to V? 



G[A] = (A,E(G[A\)), with E(G[A]) := eE:i,jeA} 



(1) 



MaxCl{G) := {g G K{G) : \V(g)\ 




(2) 
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Independent set: given a graph G = (V,E) and a positive integer k < |V|, does G 
contain an independent set of size k or more, i.e., a subset V' C V such that | V'\ > & 
and such that no two vertices in V are joined by an edge in El 

The equivalence of these problems is proved for instance in [I] lemma 3.1 pg.54. 
1.2 The case of random graphs 

Consider the set Q(n,d) of random graphs with fixed density d, i.e. of graphs G(V,E) 
having as vertex set V = {1,2, ...,n} and in which the edges are chosen independently 
with probability d. 

To study the size of the largest clique of a graph G(V,E) € Q(n,d) one can argue as 
follows. Let Y r be the number of complete subgraph with r vertices in a graph G(V, E) G 
Q(n,d). It is immediate to show that 

E(Y r )=fyd® (3) 

Let us consider the value rg(n) of r such that E(Y T ) = 1. Writing @ in terms of Stirling 
approximation and denoting b = 1/d we have that such value ro(n) is given by 

r (n) = 2 log b n - 2 log b log b n + 2 log 6 (e/2) + 1 + o(l) (4) 

The clique number of a graph G(V, E 1 ) € Q{n,d) tends, for n — > oo, to be very 

near to ro(n). More precisely, it is possible to prove the following result (see £Q): for 
almost all the graphs G G £7(N, (i) there is a constant mo(G) such that for all n > mo(G) 
and for almost all G ri subgraph of G with vertex set \V\ = n 

|a;(G„)-21og ; ,n + 21og 6 log ; ,n-21og 6 (e/2)-l| <^ (5) 

Despite the fact that the asymptotic value of oj{G n ) has such a small variability, it is 
well known that the large cliques of a random graph are very difficult to find. This is due 
to the fact that the expression of E(Y r ), which has its maximum for an r that is roughly 
ro(n)/2, decreases very rapidly when r > ro(n)/2. Hence, while it is easy (e.g. with a 
greedy algorithm) to find cliques whose size is of the order of log b n, the probability that 
one of such cliques is a subset of a clique with the size (1 + e)log b n is of the order of 
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n -a(e)\ogn £ or a jj e > anc [ h ence j s more than polynomially small (see also [5]). 

This difficulty in finding large clique of random graphs has a numerical evidence even 
for n quite small, as it will be shown later. 

1.3 The statistical mechanics approach 

We recall here very briefly the main ideas of the statistical mechanics approach to combi- 
natorial optimization problems. 

The cost function of the optimization problem (OP) can be view as the energy function 
H(x), usually called Hamiltonian, of a statistical mechanics (SM) model where instances 
of the OP are considered as configurations x € X of the SM model. The optimal con- 
figurations correspond to the ground states in the SM language. (See for instance 
Ground states in SM are the configurations where the Gibbs measure ir(x) = ^e~^ H ^ is 
concentrated in the limit of zero temperature (/? — ► oo, being (3 the inverse temperature); 
the normalization constant Z is usually called partition function. This means that to de- 
termine the ground states is sufficient to perform a random sampling at low temperature. 
To this purpose we can apply the Monte Carlo method. The main idea of this method 
is to define a Markov chain Monte Carlo (MCMC) on the configuration space X, with 
transition probabilities P(x,x') such that the transition probability in n steps, P n (x,x'), 
of the chain converges to ir(x') as n —* oo. This convergence is due to the ergodic theorem 
if for instance the transition probabilities satisfy a detailed balance condition w.r.t. the 
Gibbs measure tt: 

tt(x)P{x, x') = tt(x')P{x', x) (6) 

The strategy of the MCMC method is then the following 

- start from a configuration xq 

- look at the random evolution of the chain starting from it, xq,x\, ...x n , for a "suffi- 
ciently long time" n 

- for the final state x n we have P(x n = x) ~ ir(x). 

The main difficulty in applying this procedure is due to metastable states. Indeed local 
minima of the energy H(x) can capture the evolution xt of the chain for very large time 
intervals if the temperature is low. So the main problem in applying MCMC method is 
to define what "sufficiently long time" means. 
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A strategy to escape the problem of metastable states is to change the temperature 
during the evolution of the chain. This is known as simulated annealing. Since for high 
temperature the process leaves local minima much easily, one can look at a suitable an- 
nealing in order to avoid to remain captured in metastable states. See for instance jS] for 
the use of simulated annealing in optimization problems. 

^,From a rigorous point of view the main point in applying the MCMC method is to 
estimate the mixing time of the chain, that is the time n necessary to have that P(x n = x) 
and tv{x) are sufficiently close each other, uniformly in xo- (See for instance [7] for precise 
definitions.) 

As an example for the clique problem on a graph G £ Q(n, 1/2) we can consider as 
in jS] the following MCMC. The state space X of the chain is the collection of all cliques in 
G. To each clique x & X we associate a weight w(x) = \^ where |x| denotes the number 
of vertices of x and A > 1 is a real parameter. We can describe this weight in terms of 
a Gibbs measure ir(x) = with H{x) = —\x\ and A = e' 3 . The transition probability 
P(x,x') is different from zero only if the cliques x and x' have a symmetric difference 
(as sets of vertices) less or equal to one. In this case if x' D x we put P(x, x') = ^ and 
if x' C x we put P(x,x') = j-. The probability P(x,x) is obtained by normalization. 
It is immediate to verify that these transition probabilities satisfy the detailed balance 
condition ©. 

For this dynamics Jerrum proves that there exists an initial state from which the 
expected time to reach a clique of size at least (1 + e) log 2 n is super-polynomial in n. The 
crucial point in this proof is to show that there are few cliques that can grow up to this 
size (1 + e) log 2 n. More precisely a clique of size k is called m-gateway if there exists a 
path of the chain going from this clique to a clique of size m through cliques of size at 
least k, then it is proved in :6 that the density of m-gateways in the set of /c-cliques is 
super-polynomially small for k = [~(1 + |e) log 2 n\ and m = |"(1 + e) log 2 n\. Due to the 
fact that m-gateways have to be visited in reaching cliques larger or equal to m, then these 
m-gateways represent a bottleneck for the dynamics and their low density can be used to 
prove that the mixing time is super-polynomial in n. 
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2 A Hamiltonian for the clique problem, the vertex covering 
and the independent set 

We consider the space X := {0, 1}^ of lattice gas configurations on V; on the configura- 
tion space (or state space) X we define an Ising Hamiltonian with an antiferromagnetic 
interaction between non-neighbor sites: 

H(a) := ^2 Jij^iVj -h^ai (7) 
(id) i£V 

where 

Jij = { 8 

with E c := {(i,j) E; i,j G V} and h > 0. 

It is easy to prove that if h < 1 then the minimal value of H{a) is obtained on 
configurations with support on the vertices of a maximum clique. First of all we prove 
that H(a) is minimal on configurations a such that G(a) S JC(G). We denote with the 
same letter a configuration and its support; for instance when we write i S a we mean a 
site i in the support of a. Indeed for every a such that G(a) ^ ^C(G) a = C U A with 
G(C) a maximum clique in G(a) and \A\ > 1, then for any i E A we have H(a) > H(a\i). 
This is due to the fact that 

H(a) = H(a\i) + ^ Jij°j ~ h (9) 
j 

and if G(C) is a maximum clique in G(a) then ^ • JijiTj > ^2j & c — ^ s a secon d 
step we note that if a is such that G(a) £ /C(G) then -ff(cr) = — ||<r|, so that we can 
immediately conclude that H is minimal on the maximum cliques. 
If we consider the opposite interaction: 

fo if(i,j)e£ c 

Jij = < (10) 
\l if EE 

then the same Hamiltonian Q with interaction J is minimal on configurations with zeros 
on a minimal vertex cover and ones on the maximum independet set. 

In the case of a random graph G, i.e., when the interaction variables Jij are i.i.d.r.v., 
the Hamiltonian Q is similar to the Hamiltonian of the SK model. The main differences 
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are that our configurations are in lattice gas variables instead of spin variables and the 
interaction variables have no zero mean. Instead of a symmetry property we have now a 
control on the sign of the interaction term of the Hamiltonian. 

3 Some algorithms for the clique problem 

In this section we define three different algorithms for the clique problem that will be 
used for the numerical comparison developed in the final section. The first and the second 
are "standard" algorithms; the third algorithm is a MCMC defined by means of the 
Hamiltonian (|7j). 

3.1 A greedy algorithm, Q 

The first algorithm we introduce is a fast and greedy heuristic, denoted from now on by Q. 
The underlying idea is to start from a configuration a with <Ji = 0, Vi € V and then select 
at random a vertex j, set aj = 1 and then delete all its non adjacent vertices. In the next 
step another vertex is selected at random among the remaining vertices and again all its 
non-adjacent vertices are deleted. The process stops when it is not possible to select other 
vertices, i.e., a maximal complete subgraph is found, i.e., a clique not strictly contained 
in other cliques. 

3.2 A dismantling algorithm, V 

The second algorithm, denoted by T> in the following, is another fast heuristic. It starts 
with an initial configuration a that has ones everywhere. The algorithm considers, at each 
step, the degree of each vertex i with Oi = \ and selects the one (say j) with the smallest 
degree. Then it sets <jj = and decreases by one unit the degree of all its adjacent nodes 
in the graph and repeats the procedure until the minimum value of all the degrees is — 1 
where k is the number of sites in a with <7j = 1, i.e., the sites of a clique of cardinality k. 
Note that in principle the resulting clique could be not maximal. 

The rationale of this algorithm is to start from the whole graph and then, at each step, 
dismantle it vertex by vertex until a clique is found. 
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3.3 A Monte Carlo algorithm MC 

We can apply the ideas developed in section 1.3 to the Hamiltonian defined in section 2 
for the clique problem. For clarity we consider the Metropolis choice: for a' ^ a we take 



where [•]+ denotes the positive part and q(cr',a) is a symmetric, positive connectivity 
matrix independent of (3 with q(a' , a) > only if a and a' are different in a single site. 

We note that in the limit (3 — ► oo and h G (0, 1) fixed, starting from the configura- 
tion which is zero everywhere, this algorithm is equivalent to the greedy algorithm since 
P(a,a') = if H(a') > H[a). Thus {cr(i)} te N is a growing sequence of complete graphs. 
In the case j3 — ► oo but h — ► as ^ we see that this Monte Carlo algorithm is equivalent 
to the Jerrum algorithm on cliques recalled as an example in section 1.3. 

4 A new algorithm inspired by the cavity method, C 

In this section we introduce a new algorithm to find maximum cliques of a graph. The key 
idea is inspired by the notion of cavity field introduced in statistical mechanics to analyze 
the ground states, that is configurations minimizing the energy ( jllj). The cavity method 
at zero temperature is described in detail in jlUj in the case of a spin glass on a lattice with 
a local tree like structure. This method is equivalent to the replica method and can be 
used at different levels of approximation corresponding to the replica symmetric solution 
and to the one step replica symmetry breaking level. The main idea is to compute in the 
limit of infinite number of spins the value of the energy density of the ground state by an 
iterative procedure. Indeed one can study the effect of the addiction of a spin or of a bond 
to the system looking for equations for the corresponding average energy shift. 

We do not use the cavity method in our algorithm but we use the idea that if you 
select a spin the effect of the other spins can be described in terms of a local field, that 
we will call cavity field, as in the case of the cavity method. 

More precisely, consider the Hamiltonian defined in (J3| and consider the canonical 
ensemble, i.e., the set of configurations a G X such that ^2 i€ y &i = k. Up to a constant 
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we have that H{a) = Yluj) JijCidj. If for each i E V we define the cavity field: 



hi{a) = ^2 J ij a j + h 0- ~ °i) 



(12) 



we have immediately that H(a) = ^iey ^i( <J ) c7 «- For a given choice of the fields {/ij}j g y 
the minimal energy is clearly obtained on the configurations with support on the sites 
corresponding to the k minimal values of hi. But here the cavity fields depend on the 
configuration itself and then it is more difficult to determine the ground states. To this 
purpose we introduce a new Hamiltonian: 



defined on pairs of configurations a, a' such that ^ <7j = Y^i a \ = with k € N and 
h > 0. The hamiltonian can be rewritten in terms of the interaction of the configuration 
a 1 with each site i (cavity field hi) in the following way 



with hi = hi{a) defined in 1)12(1 . Hence the cavity field hi in the site i represents the 
number of sites j with aj = 1 that are not nearest neighbors of the site i plus a contribu- 
tion h that is present when the configuration a is not supported on the site i. Note that 
H(a, a, h, k) corresponds to the Hamiltonian in the framework of the canonical ensem- 
ble corresponding to k. We also want to stress that this new Hamiltonian is non-negative 
and if k < to(G) its value is zero (so minimal) only on pairs of configurations a, a' such 
that a = a 1 with support on a clique with k vertices. 

The idea of the algorithm is the following: start from a random configuration a with 
fixed k, and choose a new configuration a' picking randomly k sites, each site having a 
relative weig ht Wi = e-^M for some (3 > 0, and define for this sites a\ = 1, while for the 
others a[ = 0. Then repeat this procedure iteratively. After each iteration compute the 
quantity H(a,a' ,h,k). This dynamics defines a MCMC on X that satisfies the detailed 
balance condition with respect to the stationary measure 




(13) 



H(a, a', h,k) = y~" j ■i lj o l n' j + h(k - ^ cr^) = ^ h^a)^ 



(14) 




e -j3H(a,r,h,k) 



(15) 



n CT = 



e -/3H(a,r,h,k) 
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Indeed since each vertex j is chosen to have a'j = 1 with weight Wi, the transition proba- 
bility of the process P(cr, a') has the following form 

e -/3H(a,a',h,k) 

P ^= Ere - m „r M (16) 

Due to the symmetry of the couplings Jy = Jji we have 

e -f3H(a,r,h,k) & - /3H (a,a' ,h,k) 
U a P(a,a) = — e _ l3H (a,T,h,k) e -l3H(a,r,h,k) = ( 17 ) 

-j3H(cr',T,h,k) -f3H(cr',cr,h,k) 

- r ' -rVP(a» (18) 



, e -PH(a',T,h,k) e -PH(a',r,h,k) 

and therefore is the unique stationary measure of our process. 

Note that if the parameter (3 is very large and k < to(G), the stationary measure is 
concentrated exponentially in on the it's such that there exists a clique supported by 
the configuration a: actually if the support of a is not a clique H(a, r, h,k) > for all 
configurations r and the probability of the configuration a is exponentially small. 

4.1 Implementation of the algorithm C and some remarks on its mixing 
time 

To realize a single step of the Markov chain with transition probabilities defined in ()16|) 
we proceed as follows. 

1. Starting from a configuration a, compute the cavity field hi(a) for each vertex i. 

2. To sample the new configuration a' with probability (jlfij) we perform a Kawasaki- 
like algorithm 77(0), 77(1) , ...r](s), starting at 17(0) = a. At each step s this 
Kawasaki procedure is the following: pick randomly a couple of vertices such 
that rji(s) = 1 and r]j(s) = and define r/(s)^ J \ the configuration obtained by rj(s) 
by exchanging the occupation variables in the sites i and j. Then r](s + 1) = rj(s)^'^ 
with probability e~^ h i^~ hi ^ + . Since H(a,rj(s)) = Yli^v hi{ a )v( s )i we have 
H(a,rj(s + 1)) — H(a,rj(s)) = hj(a) — faj(cr) so that the invariant measure of this 
Kawasaki chain is 

e -/3H(a,a' ,h,k) 

^ = ^2 e -l3H(a,r,h,k) ( 19 ) 

as requested in (|16|). 
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Since this measure U.^, is a product measure we note that step 2, i.e., the Kawasaki 
procedure, quickly reaches its equilibrium, in a time of order nk. Much more complicated 
is an estimate for the mixing time of the chain C. Here we can only make some initial 
remarks on this problem. 

First of all we note that the function H(a(t), a(t + 1), h, k) is a non-increasing function 
of t in the limit (3 — > oo along a typical path {&(t)}t of the chain C. Indeed in the limit of 
zero temperature, the configuration o~(t + 1) minimizes the Hamiltonian 

mm H(a(t), cr, h, k) = H(a{t), a(t + 1), h, k) = H(a(t + 1), a(t), h, k) (20) 

IT 

and a{t + 2) is such that 

mmH(a(t + 1), a, h, k) = H(a(t + 1), a(t + 2), h, k) < H(a(t + 1), a(t), h, k) (21) 

So the trap configurations for the dynamics C at zero temperature are the configurations 
a such that min T H(a, r, h, k) = H(a,a,h,k). The cavity fields hi(a) have values in the 
set {q + r/i} (?e {o i i i ... i fc} irg {o,i} so we can define the different levels of the cavity fields of cr, 
i.e., for each q G {0, 1, k} and r G {0, 1} we define I q>r := {i £ V : hi(a) = q + rh}. The 
configurations r minimizing H(a, r, /i, k) have support on the sites belonging to the lowest 
levels of the cavity fields h(a). This means that a is a trap if h max (a) := maxj e<J hi(a) < 
hj(a) for each j a. On the other hand, in the case of random graphs, we know the 
distribution of the cavity field in sites j a. Indeed for these sites we have hj(o~) = 
MLj (a) + h where MLj (a) denotes the number of missing links from j to the set a (the 
support of a). Due to the fact that MLj(a) and MLy (cr) are independent variables for 
j, j' a with a binomial distribution, we also know the distribution of the numbers 
of sites j a with cavity field hj(a) = q + h: 

p(\i q ,i\=i)=^~ k y(i- P ) M (22) 

where 



p = p{q,k) :=P{ML j {a) = q) = 
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in the case of random graph with density |. The quantity 

G(a) := min hj(a) - h max (a) (24) 
] fa 

can be called the gap of the trap. 

If k = (1 + e) log 2 n with e > and if (7 <C we have that for large n 

P(|/ 9il | = 0) = (l-pr- fe ~l-n- £ (25) 

so with large probability the lowest levels corresponding to r = 1, i.e., to sites not in a, 
are empty. 

We notice that in order to really leave a trap, we have to change enough many sites in 
a single step of the dynamics. Small changes produce configurations immediately coming 
back to the trap. Indeed starting from a trap a with gap 7, denote by a' the configuration 
obtained in a single step of the dynamics and by I the number of changed sites, i.e., 
I = \{i; Oi / <7^}|. We have that \hi(a) — hi(a')\ < I + h for each site i. So if / < ^ — h we 
have that the new cavity field h(a') has the lowest levels again containing the sites of a 
and so with large probability the dynamics in the following steps will came back in a. 

Again we can apply the Jerrum argument. If a is almost a clique -i.e., h max (o-) is 
small but the maximal clique contained in s, say <to, is of size k® = (1 + |e) log 2 n and Co 
is not a /c-gateway- with probability near to one we have a gap of a of order ak with a < 1 
but strictly positive. This means that to escape the trap we have an energy barrier that 
is a positive fraction of k 2 since, if a is not a fc-gateway, a number of sites proportional to 
k has to be changed in the non-empty levels of type I q> i. 

For this reason we expect that our algorithm has a non-polynomial mixing time of 
order n alogn . However in the following section we will show that this non-polynomial 
mixing time becomes evident only when n is very large. On DIMACS random graphs we 
get better results than the other algorithms. 

Moreover we can gain from our analysis of traps a more precise knowledge of the energy 
landscape, suggesting improvements of our algorithm. This is the subject of a further paper 
where the numerical aspects of the problem will be discussed in more details. 
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4.47 16.83 



Table 1: User times for DIMACS machine benchmarks instances. 



5 Numerical comparison 

In this section we briefly give some numerical results on the algorithm introduced in the 
previous section. In particular we will compare our algorithm with the "standard" ones 
recalled in Section|3]on two groups of graphs: DIMACS benchmark graphs 8 and random 
graphs. A more complete numerical analysis will be given in 

5.1 Experimental details 

All our algorithms, the greedy Q, the dismantling D, the Monte Carlo A4C and Cavity C 
are implemented in C language and performed on a 2.5GHz Power Mac G5 Quad proces- 
sors machine with Mac OS X vl0.4 Tiger and 8Gb of RAM and compiled with gcc and 
considering the -02 switch. As required by the rules of the Second DIMACS Implemen- 
tation Challenge ||, we provide in Tabled the user times in seconds performed by one 
processor on our computer. 

5.2 Numerical results on DIMACS benchmark graphs 

The Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) makes 



available on its web site (ftp : //dimacs . rutgers . edu/pub/ challenge/graph/benchmarks) 



a suite of 79 benchmark graphs for the maximum clique size problem. Such benchmarks 
constitute an important base point in order to evaluate the performances of new algo- 
rithms in this topic. They were generated by means of different criterions and the set 
includes: 

• Random graph (Cn. d and DSJCn. d, being n the size and d the density); 

• Steiner triple graph (MANNn); 

• Brockington graph (brockn_y, with parameter y = 1, 2, 3, 4); 

• Sanchis graph (genn_p0 . 9_x, sann_0.y_z, sanrn_0.y, with parameters x = 44, 55, 
65, 75, y = 5, 7, 9 and z = 1, 2, 3); 
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• Hamming graph (hammingx-y with parameters x = 6, 8, 10 and y = 2, 4); 

• Keller graph (kellerx, with parameter x = 4, 5, 6); 

• P-hat graph (p-hatn-x, with parameter x = 1, 2, 3); 

• Pardalos graph (c-f atn-x, with parameter x = 1, 2, 5, 10); 

For additional details and references the reader could see jHj. 

In Table |2] we report the results for a selection of the 37 instances belonging to the 
Second DIMACS Implementation Challenge and in Tables 01 a selection of the remaining 
42. The tables are organized as follows. The first column report the name of the instance, 
the following three columns its characteristics, i.e., number of nodes n, number of arcs m 
and density d. Successively, we report the results for the cavity algorithm in terms of best 
achieved value and CPU time. The remaining columns are related to the values achieved 
by the algorithm presented in Section |31 In particular, column Q reports the best value 
achieved on 100 run of the greedy algorithm and columns T> and MC report both best 
achieved clique and CPU time, respectively. Note that for the sake of simplicity we do 
not report the CPU time for Q because it was always equal to 0.000. 

Let us close this section with some finale remarks on the performances of the T> and C 
algorithms. First of all we want to stress that despite to its simplicity, T> performs quite 
well on many instances, especially when the density is high and exact results are difficult 
to obtain. As far as C is concerned we consider its performances quite promising. 

5.3 Numerical results on random graphs 

In order to give a deeper analysis of the performance of our algorithm on random graphs, 
our experiments were extended to a collection of big instances built by mean of a random 
graph generator. In fact, even thought the DIMACS collection includes some random 
instances, the number of nodes are no greater then 4000. For this reason, we implemented 
a random graph generator able to build instances with a fixed number of nodes and 
density limited only by the space occupancy of the graph on the physical memory existing 
on the computer. Our choice was to build a collection of fifteen instances with n = 2* 
for i = 7,8, ...,14 i.e., for n G {128,256,512,1024,2048,4096,8192,16384} and density 
d = {0.5,0.9}. The name of the instances considers first the prefix tbb, then the number 
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of nodes, the density and finally the extension clq.b. Again, note that the instances 
follows the rules provided by the DIMACS. 

These instances are available for further research on the web on the home page of one 
of the co-author 1 . 

On the smaller graphs we obtained the certified values of the clique number by using 
the program Cliquer. This is a branch and bound algorithm given in [Tl]. As it is clear 
from the Table El the computational times of Cliquer are too long to apply it to the larger 
instances. 

In Tables |U and El are reported the results on our instances, for d = 0.5 and d = 0.9 
respectively. 
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C 


G 


V 


MC 


DIMACS benchmarks 


n 


m 


d 


u>(G) 


k 


Time (s) 


k 


k 


Time(s) 


k 


Time(s) 


C125.9 


125 


6963 


0.898 


34 


34 


0.060 


23 


32 


0.000 


28 


0.640 


C250.9 


250 


27984 


0.899 


44 


44 


0.270 


29 


39 


0.000 


35 


2.660 


C500.9 


500 


112332 


0.900 


57 


57 


2.670 


36 


47 


0.000 


41 


15.340 


C1000.9 


1000 


450079 


0.901 


68 


68 


20.970 


43 


53 


0.080 


46 


77.840 


C2000.9 


2000 


1799532 


0.900 


>80 


77 


75.760 


49 


56 


0.330 


52 


371.460 


DSJC500.5 


500 


62624 


0.502 


14 


13 


1.290 


9 


8 


0.010 


11 


18.770 


DSJC1000.5 


1000 


499652 


0.500 


15 


15 


15.580 


10 


10 


0.070 


12 


95.970 


C2000.5 


2000 


999836 


0.500 


>16 


16 


9.900 


12 


9 


0.310 


13 


409.530 


C4000.5 


4000 


4000268 


0.500 


>18 


18 


104.020 


12 


12 


1.380 


15 


1889.440 


MANN_a27 


378 


70551 


0.990 


126 


124 


6.600 


90 


117 


0.000 


110 


5.380 


brock200_2 


200 


9876 


0.496 


12 


12 


0.000 


8 


8 


0.000 


10 


2.330 


brock200_4 


200 


13089 


0.658 


17 


17 


0.040 


11 


12 


0.000 


14 


2.140 


brock400_2 


400 


59786 


0.749 


29 


25 


1.050 


17 


21 


0.000 


20 


9.100 


brock400_4 


400 


59765 


0.749 


33 


25 


1.340 


17 


20 


0.010 


20 


9.040 


brock800_2 


800 


208166 


0,651 


>21 


21 


0.350 


14 


7 


0.020 


17 


33.050 


brock800_4 


800 


207643 


0,650 


>21 


21 


0.610 


14 


13 


0.040 


17 


333.070 


gen200_p0 . 9_44 


200 


17910 


0.900 


44 


44 


0.360 


27 


31 


0.000 


33 


1.600 


gen200_p0 . 9_55 


200 


17910 


0.900 


55 


55 


0.030 


28 


35 


0.000 


41 


1.600 


gen400_pO . 9_55 


400 


71820 


0.900 


>55 


50 


0.130 


34 


29 


0.010 


40 


7.520 


gen400_pO . 9_65 


400 


71820 


0.900 


>65 


54 


0.030 


34 


32 


0.010 


40 


7.520 


gen400_pO . 9_75 


400 


71820 


0.900 


>75 


75 


0.120 


36 


37 


0.010 


52 


7.530 


hamming8-4 


256 


20864 


0.639 


16 


14 


0.070 


10 


16 


0.000 


16 


3.200 


keller4 


171 


9435 


0.649 


11 


11 


0.000 


8 


8 


0.000 


11 


1.340 


keller5 


776 


225990 


0.752 


>27 


23 


4.570 


17 


15 


0.030 


20 


41.880 


p_hat300-l 


300 


10933 


0.244 


8 


8 


0.350 


6 


7 


0.000 


8 


4.650 


p_hat300-2 


300 


21928 


0.489 


25 


25 


0.150 


16 


22 


0.000 


22 


4.980 


p_hat300-3 


300 


33390 


0.744 


36 


36 


3.160 


19 


31 


0.000 


30 


4.320 


p_hat700-l 


700 


60999 


0.249 


11 


11 


2.330 


7 


7 


0.030 


10 


38.060 


p_hat700-2 


700 


121728 


0.498 


44 


44 


7.690 


24 


40 


0.030 


35 


39.110 


p_hat700-3 


700 


183010 


0.748 


>62 


62 


13.570 


31 


58 


0.030 


47 


37.940 


p_hatl500-l 


1500 


284923 


0.253 


12 


12 


0.780 


7 


9 


0.180 


11 


221.800 


p_hatl500-2 


1500 


568960 


0.506 


>65 


65 


18.130 


30 


61 


0.180 


48 


224.230 



Table 2: Results for the DIMACS benchmarks (Second challenge set) 







C 


Q 


V 


MC 


DIMACS benchmarks 


n 


m 


d 


u(G) 


k 


Timc(s) 


k 


k 


Timc(s) 


k 


Time(s) 


brock200_l 


200 


14834 


0.745 


21 


21 


1.800 


16 


16 


0.000 


17 


1.970 


brock200_3 


200 


12048 


0.605 


15 


14 


0.360 


10 


11 


0.000 


13 


2.240 


brock400_l 


400 


59723 


0.748 


27 


25 


0.400 


16 


18 


0.010 


21 


9.150 


brock400_3 


400 


59681 


0.748 


31 


25 


0.400 


16 


17 


0.000 


20 


9.200 


brock800_l 


800 


207505 


0.649 


23 


21 


1.390 


15 


14 


0.040 


17 


55.970 


brock800_3 


800 


207333 


0.649 


25 


22 


0.880 


14 


14 


0.040 


18 


55.920 


c-f at200-l 


200 


1534 


0.077 


12 


12 


0.000 


8 


12 


0.000 


12 


1.740 


c-f at200-2 


200 


3235 


0.163 


24 


24 


0.010 


14 


24 


0.000 


23 


1.750 


c-fat200-5 


200 


8473 


0.426 


58 


58 


0.010 


32 


58 


0.000 


48 


1.790 


c-f at500-10 


500 


46627 


0.374 


126 


126 


0.020 


66 


126 


0.010 


96 


14.900 


c-f at500-l 


500 


4459 


0.036 


14 


14 


0.000 


9 


14 


0.010 


12 


14.760 


c-fat500-2 


500 


9139 


0.073 


26 


26 


0.020 


15 


26 


0.010 


22 


14.950 


c-fat500-5 


500 


23191 


0.186 


64 


64 


0.020 


34 


64 


0.010 


51 


14.660 


hamming6-2 


64 


1824 


0.905 


32 


32 


0.020 


18 


32 


0.000 


29 


0.160 


hamming6-4 


64 


704 


0.349 


4 


4 


0.000 


3 


4 


0.000 


4 


0.210 


hamming8-2 


256 


31616 


0.969 


128 


128 


0.020 


58 


128 


0.000 


97 


2.270 


jolmson8-2-4 


28 


210 


0.556 


4 


4 


0.010 


3 


4 


0.000 


4 


0.050 


jolmson8-4-4 


70 


1855 


0.768 


14 


14 


0.010 


9 


8 


0.000 


14 


0.220 


johnsonl6-2-4 


120 


5460 


0.765 


8 


8 


0.020 


7 


8 


0.000 


8 


0.620 


jolmson32-2-4 


496 


107880 


0.879 


16 


16 


0.890 


15 


16 


0.010 


16 


14.350 


MANN_a9 


45 


918 


0.927 


16 


16 


0.000 


12 


12 


0.000 


16 


0.090 


p_hat500-l 


500 


31569 


0.253 


9 


9 


0.220 


7 


7 


0.010 


9 


18.170 


p_hat500-2 


500 


62946 


0.505 


36 


36 


0.120 


18 


32 


0.010 


29 


18.940 


p_hat500-3 


500 


93800 


0.752 


>50 


50 


2.280 


26 


46 


0.010 


39 


17.080 


p_hatlOOO-l 


1000 


122253 


0.245 


>10 


10 


2.280 


7 


8 


0.080 


9 


92.480 


p_hatl000-2 


1000 


244799 


0.490 


>46 


46 


0.270 


23 


42 


0.070 


36 


94.620 


p_hatl000-3 


1000 


371746 


0.744 


>68 


68 


1.950 


33 


55 


0.070 


49 


89.710 


san200_0.7_l 


200 


13930 


0.700 


30 


30 


0.010 


16 


15 


0.000 


16 


2.070 


san200_0.7_2 


200 


13930 


0.700 


18 


15 


0.070 


13 


12 


0.000 


13 


2.060 


san200_0 . 9_1 


200 


17910 


0.900 


70 


62 


0.020 


39 


45 


0.000 


45 


1.600 


san200_0 . 9_2 


200 


17910 


0.900 


60 


60 


0.020 


31 


35 


0.000 


45 


1.610 


san200_0.9_3 


200 


17910 


0.900 


44 


42 


0.010 


26 


24 


0.000 


30 


1.600 


san400_0 . 9_1 


400 


71820 


0.900 


100 


96 


0.020 


48 


50 


0.000 


51 


7.360 


sanr200_0 . 7 


200 


13868 


0.697 


18 


18 


0.080 


12 


16 


0.000 


16 


2.070 


sanr200_0 . 9 


200 


17863 


0.898 


42 


42 


0.150 


27 


36 


0.000 


34 


1.610 


sanr400_0 . 5 


400 


39984 


0.501 


13 


13 


0.910 


9 


8 


0.000 


11 


10.750 


sanr400_0.7 


400 


55869 


0.700 


21 


21 


0.150 


14 


16 


0.000 


17 


9.550 



Table 3: Results for the DIMACS benchmarks (continue) 







c 


Q 


V 


MC 


Random graph 


n 


m 


d 


k 


Time(s) 


k 


k 


Time(s) 


k 


Time(s) 


tbbl28.5 


128 


4061 


0.500 


11 


0.010 


7 


8 


0.000 


11 


7.520 


tbb256.5 


256 


16310 


0.500 


12 


0.510 


9 


9 


0.000 


11 


30.750 


tbb512.5 


512 


65457 


0.500 


13 


1.150 


9 


10 


0.000 


12 


131.980 


tbbl024.5 


1024 


262084 


0.500 


15 


2.170 


10 


9 


0.060 


13 


589.350 


tbb2048.5 


2048 


1048289 


0.500 


16 


9.280 


11 


10 


0.222 


14 


2564.550 


tbb4096.5 


4096 


4192863 


0.500 


17 


73.840 


12 


11 


1.870 


15 


11061.950 


tbb8192.5 


8192 


16778527 


0.500 


19 


311.950 


13 


11 


8.130 


16 


50238.440 


tbbl6384.5 


16384 


67106538 


0.500 


19 


170.470 


14 


11 


33.850 


17 


216040.910 



Table 4: Results for random graph with density d = 0.5 







C 


Q 


V 


MC 


Random graphs 


n 


m 


d 


k 


Time(s) 


k 


k 


Time(s) 


k 


Time(s) 


tbbl28.9 


128 


7315 


0.900 


34 


0.080 


24 


29 


0.000 


30 


5.860 


tbb256.9 


256 


29392 


0.900 


44 


1.310 


28 


37 


0.000 


36 


23.390 


tbb512.9 


512 


117794 


0.900 


56 


3.190 


37 


46 


0.010 


42 


98.940 


tbbl024.9 


1024 


471440 


0.900 


67 


9.720 


42 


49 


0.060 


48 


445.620 


tbb2048.9 


2048 


1886256 


0.900 


76 


35.740 


49 


55 


0.222 


54 


1958.010 


tbb4096.9 


4096 


7548970 


0.900 


84 


41.780 


56 


57 


1.860 


59 


8307.030 


tbb8192.9 


8192 


30198965 


0.900 


90 


182.350 


61 


63 


8.120 


66 


35811.920 



Table 5: Results for random graph with density d = 0.9 





C 


Cliqucr 


Random graphs 


k 


Timc(s) 


k 


Timc(s) 


tbbl28.5 


11 


0.010 


11 


0.000 


tbb256.5 


12 


0.510 


12 


0.130 


tbb512.5 


13 


1.150 


14 


10.540 


tbbl024.5 


15 


2.170 


15 


1769.910 


tbbl28.9 


34 


0.080 


34 


61.570 



Table 6: Comparison between C and Cliquer on some random instances 
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